Computer-aided visualization of expression comparison

ABSTRACT

Innovative systems and methods for visualizing information collected from analyzing samples are provided. The samples may include nucleic acids, proteins, or other polymers. Gene expression level as determined from analysis of a nucleic acid sample is one possible analysis result that may be visualized. In one embodiment, a computer system may display the expression levels of multiple genes simultaneously in a way that facilitates user identification of genes whose expression is significant to a characteristic such as disease or resistance to disease. Additionally, the computer system may facilitate display of further information about relevant genes once they are identified.

BACKGROUND OF THE INVENTION

[0001] The present invention relates to the field of computer systems.More specifically, the present invention relates to computer systems forvisualizing analysis results.

[0002] Devices and computer systems for forming and using arrays ofmaterials on a substrate are known. For example, PCT Publication No. WO92/10588, incorporated herein by reference for all purposes, describestechniques for sequencing or sequence checking nucleic acids and othermaterials. Arrays for performing these operations may be formedaccording to the methods of, for example, the pioneering techniquesdisclosed in U.S. Pat. Nos. 5,143,854 and 5,593,839 both incorporatedherein by reference for all purposes.

[0003] According to one aspect of the techniques described therein, anarray of nucleic acid probes is fabricated at known locations on asubstrate or chip. A fluorescently labeled nucleic acid is then broughtinto contact with the chip and a scanner generates an image file (whichis processed into a cell file) indicating the locations where thelabeled nucleic acids bound to the chip. Based upon the cell file andidentities of the probes at specific locations, it becomes possible toextract information such as the monomer sequence of DNA or RNA. Suchsystems have been used to form, for example, arrays of DNA that may beused to study and detect mutations relevant to cystic fibrosis, the P53gene (relevant to certain cancers), HIV, and other geneticcharacteristics.

[0004] Computer-aided techniques for monitoring gene expression usingsuch arrays of probes have also been developed as disclosed in U.S.patent application Ser. No. 08/828,952 (Attorney Docket No.16528X-028900US) and PCT Publication No. WO 97/10365 (Attorney DocketNo. 16528X-017110PC), the contents of which are herein incorporated byreference. Many disease states are characterized by differences in theexpression levels of various genes either through changes in the copynumber of the genetic DNA or through changes in levels of transcription(e.g., through control of initiation, provision of RNA precursors, RNAprocessing, etc.) of particular genes. For example, losses and gains ofgenetic material play an important role in malignant transformation andprogression. Furthermore, changes in the expression (transcription)levels of particular genes (e.g., oncogenes or tumor suppressors), serveas signposts for the presence and progression of various cancers.

[0005] It is desirable to identify genes having expression levelsrelevant to diagnosis of a diseased state by analyzing the expressionlevels of large numbers of genes in both diseased and normalindividuals. Methods for collecting the expression level informationhave been developed. However, the user interfaces for gene expressionmonitoring systems that have been developed until now are designed toclearly present the expression of particular pre-selected genes. A userseeking to identify, e.g., an oncogene or a tumor suppressor gene, mustindividually review the expression level of large numbers of genes andcompare the expression levels between diseased and normal individuals.What is needed is a user interface that takes advantage of collectedgene expression information to help the user to identify particulargenes of interest.

SUMMARY OF THE INVENTION

[0006] The present invention provides innovative systems and methods forvisualizing information collected from analyzing samples. The samplesmay include nucleic acids, proteins, or other polymers. Gene expressionlevel as determined from analysis of a nucleic acid sample is onepossible analysis result that may be visualized. In one embodiment, acomputer system may display the expression levels of multiple genessimultaneously in a way that facilitates user identification of geneswhose expression is significant to a characteristic such as disease orresistance to disease. Additionally, the computer system may facilitatedisplay of further information about relevant genes once they areidentified.

[0007] A first aspect of the invention provides a computer-implementedmethod for presenting expression level information as collected fromfirst and second samples. The method includes steps of: displaying afirst axis corresponding to expression level in the first sample, anddisplaying a second axis substantially perpendicular to the first axis,the second axis corresponding to expression level in the second sample.The method further includes a step of: for a selected expressedsequence, displaying a mark at a position. The position is selectedrelative to the first axis in accordance with an expression level of theselected expressed sequence in the first sample and relative to thesecond axis in accordance with an expression level of the selectedexpressed sequence in the second sample. A particularly usefulapplication is displaying many marks simultaneously for many selectedgenes to discover which ones of the selected genes may be relevant tothe characteristic.

[0008] A second aspect of the invention provides a computer-implementedmethod of presenting sample analysis information. The method includessteps of: displaying a first axis corresponding to a concentration of acompound in a first sample as determined by monitoring binding of thecompound to a selected polymer having binding affinity to the compound,and displaying a second axis substantially perpendicular to the firstaxis. The second axis corresponds to a concentration of the compound inthe second sample as determined by monitoring binding of the compound tothe selected polymer. The method further preferably includes a step ofdisplaying a mark at a position. The position is selected relative tothe first axis in accordance with the concentration in the first sampleand relative to the second axis in accordance with the concentration inthe second sample.

[0009] A further understanding of the nature and advantages of theinventions herein may be realized by reference to the remaining portionsof the specification and the attached drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010]FIG. 1 illustrates an example of a computer system that may beused to execute software embodiments of the present invention.

[0011]FIG. 2 shows a system block diagram of a typical computer system.

[0012]FIG. 3 illustrates an overall system for forming and analyzingarrays of polymers including biological materials such as DNA or RNA.

[0013]FIG. 4 is an illustration of an embodiment of software for theoverall system.

[0014]FIG. 5 shows a flowchart of a process of monitoring the expressionof a gene by comparing hybridization intensities of pairs of perfectmatch and mismatch probes.

[0015]FIG. 6 shows a screen display illustrating gene expression levelsfor multiple genes as collected from both normal and diseased tissue.

[0016] FIGS. 7A-7B show screen displays illustrating information about aparticular gene selected from the display of FIG. 6.

DESCRIPTION OF SPECIFIC EMBODIMENTS

[0017] The present invention provides innovative methods of monitoringvisualizing gene expression. In the description that follows, theinvention will be described in reference to preferred embodiments.However, the description is provided for purposes of illustration andnot for limiting the spirit and scope of the invention.

[0018]FIG. 1 illustrates an example of a computer system that may beused to execute software embodiments of the present invention. FIG. 1shows a computer system 1 which includes a monitor 3, screen 5, cabinet7, keyboard 9, and mouse 11. Mouse 11 may have one or more buttons suchas mouse buttons 13. Cabinet 7 houses a CD-ROM drive 15 and a hard drive(not shown) that may be utilized to store and retrieve software programsincluding computer code incorporating the present invention. Although aCD-ROM 17 is shown as the computer readable medium, other computerreadable media including floppy disks, DRAM, hard drives, flash memory,tape, and the like may be utilized. Cabinet 7 also houses familiarcomputer components (not shown) such as a processor, memory, and thelike.

[0019]FIG. 2 shows a system block diagram of computer system 1 used toexecute software embodiments of the present invention. As in FIG. 1,computer system 1 includes monitor 3 and keyboard 9. Computer system 1further includes subsystems such as a central processor 50, systemmemory 52, I/O controller 54, display adapter 56, removable disk 58,fixed disk 60, network interface 62, and speaker 64. Removable disk 58is representative of removable computer readable media like floppies,tape, CD-ROM, removable hard drive, flash memory, and the like. Fixeddisk 60 is representative of an internal hard drive or the like. Othercomputer systems suitable for use with the present invention may includeadditional or fewer subsystems. For example, another computer systemcould include more than one processor 50 (i.e., a multi-processorsystem) or memory cache.

[0020] Arrows such as 66 represent the system bus architecture ofcomputer system 1. However, these arrows are illustrative of anyinterconnection scheme serving to link the subsystems. For example,display adapter 56 may be connected to central processor 50 through alocal bus or the system may include a memory cache. Computer system 1shown in FIG. 2 is but an example of a computer system suitable for usewith the present invention. Other configurations of subsystems suitablefor use with the present invention will be readily apparent to one ofordinary skill in the art. In one embodiment, the computer system is anIBM compatible personal computer.

[0021] The VLSIPS™ and GeneChip™ technologies provide methods of makingand using very large arrays of polymers, such as nucleic acids, on verysmall chips. See U.S. Pat. No. 5,143,854 and PCT Patent Publication Nos.WO 90/15070 and 92/10092, each of which is hereby incorporated byreference for all purposes. Nucleic acid probes on the chip are used todetect complementary nucleic acid sequences in a sample nucleic acid ofinterest (the “target” nucleic acid).

[0022] It should be understood that the probes need not be nucleic acidprobes but may also be other receptors, such as antibodies, or polymerssuch as peptides. Peptide probes may be used to detect the concentrationof other peptides, proteins, or other compounds in a sample. The probesmust be carefully selected to have bonding affinity to the compoundwhose concentration they are to be used to measure.

[0023] In one embodiment, the present invention provides methods ofvisualizing information relating to the concentration of compounds in asample as measured by monitoring affinity of the compounds to probes. Ina particular application, the concentration information is generated byanalysis of hybridization intensity files for a chip containinghybridized nucleic acid probes. The hybridization of a nucleic acidsample to certain probes may represent the expression level of one moregenes or expressed sequence tags (ESTs). The expression level of a geneor EST is herein understood to be the concentration within a sample ofmRNA or protein that would result from the transcription of the gene orEST.

[0024] Expression level information visualized by virtue of the presentinvention need not be obtained from probes but may originate from anysource. If the expression information is collected from a probe array,the probe array need not meet any particular criteria for size anddensity. Furthermore, the present invention is not limited tovisualizing fluorescent measurements of bondings such as hybridizationsbut may be readily utilized to visualize other measurements.

[0025] Concentration of compounds other than nucleic acids may bevisualized according to one embodiment of the present invention. Forexample, a probe array may include peptide probes which may be exposedto protein samples, polypeptide samples, or other compounds which may ormay not bond to the peptide probes. By appropriate selection of thepeptide probes, one may detect the presence or absence of particularcompounds which would bond to the peptide probes.

[0026] For purposes of illustration, the present invention is describedas being part of a system that designs a chip mask, synthesizes theprobes on the chip, labels nucleic acids from a target sample, and scansthe hybridized probes. Such a system is set forth in U.S. Pat. No.5,571,639 which is hereby incorporated by reference for all purposes.However, the present invention may be used separately from the overallsystem for analyzing data generated by such systems, such as at remotelocations, or for visualizing the results of other systems forgenerating expression information, or for visualizing concentrations ofpolymers other than nucleic acids.

[0027]FIG. 3 illustrates a computerized system for forming and analyzingarrays of biological materials such as RNA or DNA. A computer 100 isused to design arrays of biological polymers such as RNA or DNA. Thecomputer 100 may be, for example, an appropriately programmed IBMpersonal computer compatible running Windows NT including appropriatememory and a CPU as shown in FIGS. 1 and 2. The computer system 100obtains inputs from a user regarding characteristics of a gene ofinterest, and other inputs regarding the desired features of the array.Optionally, the computer system may obtain information regarding aspecific genetic sequence of interest from an external or internaldatabase 102 such as GenBank. The output of the computer system 100 is aset of chip design computer files 104 in the form of, for example, aswitch matrix, as described in PCT application WO 92/10092, and otherassociated computer files.

[0028] The chip design files are provided to a system 106 that designsthe lithographic masks used in the fabrication of arrays of moleculessuch as DNA. The system or process 106 may include the hardwarenecessary to manufacture masks 110 and also the necessary computerhardware and software 108 necessary to lay the mask patterns out on themask in an efficient manner. As with the other features in FIG. 3, suchequipment may or may not be located at the same physical site, but isshown together for ease of illustration in FIG. 3. The system 106generates masks 110 or other synthesis patterns such as chrome-on-glassmasks for use in the fabrication of polymer arrays.

[0029] The masks 110, as well as selected information relating to thedesign of the chips from system 100, are used in a synthesis system 112.Synthesis system 112 includes the necessary hardware and software usedto fabricate arrays of polymers on a substrate or chip 114. For example,synthesizer 112 includes a light source 116 and a chemical flow cell 118on which the substrate or chip 114 is placed. Mask 110 is placed betweenthe light source and the substrate/chip, and the two are translatedrelative to each other at appropriate times for deprotection of selectedregions of the chip. Selected chemical reagents are directed throughflow cell 118 for coupling to deprotected regions, as well as forwashing and other operations. All operations are preferably directed byan appropriately programmed computer 119, which may or may not be thesame computer as the computer(s) used in mask design and mask making.

[0030] The substrates fabricated by synthesis system 112 are optionallydiced into smaller chips and exposed to marked targets. The targets mayor may not be complementary to one or more of the molecules on thesubstrate. The targets are marked with a label such as a fluoresceinlabel (indicated by an asterisk in FIG. 3) and placed in scanning system120. Scanning system 120 again operates under the direction of anappropriately programmed digital computer 122, which also may or may notbe the same computer as the computers used in synthesis, mask making,and mask design. The scanner 120 includes a detection device 124 such asa confocal microscope or CCD (charge-coupled device) that is used todetect the location where labeled target has bound to the substrate. Theoutput of scanner 120 is an image file(s) 124 indicating, in the case offluorescein labeled target, the fluorescence intensity (photon counts orother related measurements, such as voltage) as a function of positionon the substrate. Since higher photon counts will be observed where thelabeled target has bound more strongly to the array of polymers, andsince the monomer sequence of the polymers on the substrate is known asa function of position, it becomes possible to determine the sequence(s)of polymer(s) on the substrate that are complementary to the target.

[0031] The image file 124 is provided as input to an analysis system 126that incorporates the visualization and analysis methods of the presentinvention. Again, the analysis system may be any one of a wide varietyof computer system. The present invention provides various methods ofanalyzing and visualizing the chip design files and the image files,providing appropriate output 128. The chip design need not include anyparticular number of probes. It should be understood that the presentinvention does not require any particular source of expression levelinformation.

[0032]FIG. 4 provides a simplified illustration of the overall softwaresystem used in the operation of one embodiment of the invention. Asshown in FIG. 4, the system first identifies the nucleotide sequence(s)or targets that would be of interest in a particular expression levelanalysis at step 202. The sequences of interest correspond to RNAtranscripts of one or more genes, ESTs or nucleic acids derived from themRNA transcripts. Sequence selection may be provided via manual input oftext files or may be from external sources such as GenBank.

[0033] At step 204 the system evaluates the sequences of interest todetermine or assist the user in determining which probes would bedesirable on the chip, and provides an appropriate “layout” on the chipfor the probes. The process of selecting probes for an expression levelanalysis is explained in PCT Publication No. WO 97/10365, the contentsof which are herein incorporated by reference. An alternative probeselection process that does not require prior knowledge of sequences ofinterest is explained in PCT Publication No. WO97/27317 (Attorney DocketNo. 18547-019410PC), the contents of which are herein incorporated byreference. Further general background on probe selection is found in PCTPublication No. WO95/11995 (Attorney Docket No. 18547-004111PC) and PCTPublication No. WO97/29212 (Attorney Docket No. 18547-018540PC), thecontents of which are herein incorporated by reference. The term“perfect match probe” refers to a probe that has a sequence that isperfectly complementary to a particular target sequence. The test probeis typically perfectly complementary to a portion (subsequence) of thetarget sequence. The term “mismatch control” or “mismatch probe” referto probes whose sequence is deliberately selected not to be perfectlycomplementary to a particular target sequence. For each mismatch (MM)control in an array there typically exists a corresponding perfect match(PM) probe that is perfectly complementary to the same particular targetsequence.

[0034] The process compares hybridization intensities of pairs ofperfect match and mismatch probes that are preferably covalentlyattached to the surface of a substrate or chip. Most preferably, thenucleic acid probes have a density greater than about 60 differentnucleic acid probes per 1 cm² of the substrate.

[0035] Initially, nucleic acid probes are selected that arecomplementary to the target sequence. These probes are the perfect matchprobes. Another set of probes is specified that are intended to be notperfectly complementary to the target sequence. These probes are themismatch probes and each mismatch probe includes at least one nucleotidemismatch from a perfect match probe. Accordingly, a mismatch probe andthe perfect match probe to which it is identical except for one basemake up a pair. As mentioned earlier, the nucleotide mismatch ispreferably near the center of the mismatch probe.

[0036] The probe lengths of the perfect match probes are typicallychosen to exhibit detectably greater hybridization with the targetsequence relative to the mismatch probes. For example, the nucleic acidprobes may be all 20-mers. However, probes of varying lengths may alsobe synthesized on the substrate for any number of reasons includingresolving ambiguities.

[0037] Again referring to FIG. 4, at step 206 the masks for thesynthesis are designed. At step 208 the software utilizes the maskdesign and layout information to make the DNA or other polymer chips.This step 208 will control, among other things, relative translation ofa substrate and the mask, the flow of desired reagents through a flowcell, the synthesis temperature of the flow cell, and other parameters.At step 210, another piece of software is used in scanning a chip thussynthesized and exposed to a labeled target. The software controls thescanning of the chip, and stores the data thus obtained in a file thatmay later be utilized to extract hybridization information.

[0038] At step 212 a computer system utilizes the layout information andthe fluorescence information to evaluate the hybridized nucleic acidprobes on the chip. Among the important pieces of information obtainedfrom DNA chips are the relative fluorescent intensities obtained fromthe perfect match probes and mismatch probes. These intensity levels areused to estimate an expression level for a gene or EST. The computersystem used for analysis will preferably have available other details ofthe experiment including possibly the gene name, gene sequence, probesequences, probe locations on the substrate, and the like.

[0039] According to the present invention, at step 214, the samecomputer system used for analysis or another one displays the expressionlevel information in a format useful for identifying genes of interest.The visualized expression level information may include informationcollected from multiple applications of one or more previous steps ofFIG. 4.

[0040]FIG. 5 is a flowchart describing steps of estimating an expressionlevel for a particular gene and determining whether the expression levelis sufficiently high to be displayed. At step 952, the computer systemreceives raw scan data of N pairs of perfect match and mismatch probes.In a preferred embodiment, the hybridization intensities are photoncounts from a fluorescein labeled target that has hybridized to theprobes on the substrate. For simplicity, the hybridization intensity ofa perfect match probe will be designed “I_(pm)” and the hybridizationintensity of a mismatch probe will be designed “I_(mm).”

[0041] Hybridization intensities for a pair of probes are retrieved atstep 954. The background signal intensity is subtracted from each of thehybridization intensities of the pair at step 956. Backgroundsubtraction can also be performed on all the raw scan data at the sametime.

[0042] At step 958, the hybridization intensities of the pair of probesare compared to a difference threshold (D) and a ratio threshold (R). Itis determined if the difference between the hybridization intensities ofthe pair (I_(pm)−I_(mm)) is greater than or equal to the differencethreshold AND the quotient of the hybridization intensities of the pair(I_(pm)/I_(mm)) is greater than or equal to the ratio threshold. Thedifference thresholds are typically user defined values that have beendetermined to produce accurate expression monitoring of a gene or genes.In one embodiment, the difference threshold is 20 and the ratiothreshold is 1.2.

[0043] If I_(pm)−I_(mm)>=D and I_(pm)/I_(mm)>=R, the value NPOS isincremented at step 960. In general, NPOS is a value that indicates thenumber of pairs of probes which have hybridization intensitiesindicating that the gene is likely expressed. NPOS is utilized in adetermination of the expression of the gene.

[0044] At step 962, it is determined if I_(mm)−I_(pm)>=D andI_(mm)/I_(pm)>=R. If these expressions are true, the value NNEG isincremented at step 964. In general, NNEG is a value that indicates thenumber of pairs of probes which have hybridization intensitiesindicating that the gene is likely not expressed. NNEG, like NPOS, isutilized in a determination of the expression of the gene.

[0045] For each pair that exhibits hybridization intensities eitherindicating the gene is expressed or not expressed, a log ratio value(LR) and intensity difference value (IDIF) are calculated at step 966.LR is calculated by the log of the quotient of the hybridizationintensities of the pair (I_(pm)/I_(mm)). The IDIF is calculated by thedifference between the hybridization intensities of the pair(I_(pm)−I_(mm)). If there is a next pair of hybridization intensities atstep 968, they are retrieved at step 954.

[0046] At step 972, a decision matrix is utilized to indicate if thegene is expressed. The decision matrix utilizes the values N, NPOS,NNEG, LR (multiple LRs), and IDIF (multiple IDIFs). The following fourassignments are performed:

P1=NPOS/NNEG

P2=NPOS/N

P3=SUM(LR)/^(N)

P4=SUM(IDIF)/N

[0047] These P values are then utilized to determine if the gene isexpressed and if the expression level should be displayed. In apreferred embodiment, the expression level of a gene should be displayedif:

P1>2.2

P2>0.3

P3>0.8

P4>30

[0048] Once all the pairs of probes have been processed and theexpression of the gene indicated, an average of the IDIF values for theprobes that incremented NPOS or NNEG is calculated at step 975, which isutilized as an expression level. Of course, other values including oneof P1 through P4 could be used to indicate expression level.

[0049] For simplicity, FIG. 5 was described in reference to a singlegene or EST. However, the visualization system of the present inventiondisplays expression results for many genes to facilitate discovery ofgenes of interest or ESTs. Furthermore, the present inventioncontemplates display of expression levels of a single gene or ESTs ascollected from two or more different samples such as tissue samples. Thesample sources preferably differ in some characteristic. It will beunderstood that when the term “sample” is used herein, measurements madeon a single “sample” can be based on an aggregation of multiple samplecollection events or even multiple organisms.

[0050]FIG. 6 shows a screen display illustrating gene expression levelsfor multiple genes as collected from two tissue samples. A displayedhorizontal axis 1002 represents expression level measured in one or morenucleic acid samples taken from the first tissue sample. A displayedvertical axis 1004 represents expression level in one or more nucleicacid samples taken from the second tissue sample. Each of marks 1006represent a particular gene whose expression level has been measured inboth the first and second tissue samples. Each mark 1006 is placed at adistance from vertical axis 1004 corresponding to expression level inthe first tissue sample and at a distance from the horizontal axis 1002corresponding to expression level in the second tissue sample.

[0051] The expression levels used for determining the position of marks1006 are preferably taken from the result of step 975. The position ofeach of marks 1006 depends on two iterations of the steps of FIG. 5,once for the sample taken from the first tissue sample and once for thesample taken from the second tissue sample. However, a mark ispreferably displayed only if one of the samples meets the thresholdcriteria at step 972.

[0052] In the depicted representative screen display, the first tissuesample is a cancerous tissue sample and the second tissue sample is anormal tissue sample. The individual marks represent the expressionlevels of selected genes in both cancerous and normal tissue. A firstgroup of marks 1008 represent genes that are neither tumor suppressorsnor oncogenes since their expression levels are roughly similar for bothnormal and cancerous tissue. These marks 1008 fall roughly along a linewhich is rotated 45 degrees from each of the axes. A second group ofmarks 1010 represent genes that are likely oncogenes since theirexpression levels are found to be significantly higher in canceroustissue than in normal tissue. A third group of marks 1012 representgenes that are likely tumor suppressors since their expression levelsare found to be significantly higher in normal tissue than in canceroustissue. It will be appreciated that expression levels for large numbersof genes can be reviewed at once to discover the oncogenes and tumorsuppressors.

[0053] Although in the depicted display, the two types of tissue arenormal tissue and cancerous tissue, the present invention would aid inthe discovery of genes whose expression is associated with anycharacteristic that varies among tissue samples. For example, once cancompare expression results from tissue from individuals who have beenexposed to HIV but remain infected to tissue obtained from infectedindividuals to identify genes conferring resistance to HIV. One cancompare expression results between tissue from plants that survivedrought to plants that do not. One can compare expression levels amongtissue samples at successive stages or severity levels of the samedisease, among tissue samples where different ultimate outcomes of thedisease (e.g., patient death or remission) are known, among diseasedtissue samples that have been subject to different treatment regimesincluding e.g, chemotherapy, antisense RNA, etc. For cancers, one cancompare expression levels between malignant cells and non-malignantcells. Also expression levels can be compared among different organs,between species, and among different stages of development of an organ.

[0054] It will be appreciated that the present invention alsoencompasses displays with more than two dimensions. A third visualdimension can be used to illustrate expression level from a third tissuesample. The time dimension can also be used to illustrate successivegroups of two or three tissue samples at successive time periods. Thetime dimension can be also used to correspond to tissue samples obtainedat, e.g, successive stages of a disease.

[0055] Other interface methods corresponding to human senses other thansight can also be incorporated within the presentation system of thepresent invention. The senses may correspond to additional dimensions.For example, marks can be displayed in succession accompanies by a soundhaving characteristics corresponding to expression level in anothertissue sample.

[0056] The user can employ a cursor 1014 to identify a particular markas being of interest. Cursor 1014 can be moved to a particular mark byuse of, e.g., mouse 11. Once cursor 1014 is over a mark of interest, themark can be selected by, e.g., depression of one of mouse buttons 13.Selection of a particular mark can be facilitated by use of a zoomdisplay feature (not shown). Once a particular mark is selected, furtherinformation is displayed about the gene represented by the mark. Aspecial mouse can transmit a tactile sensation back to the usercorresponding to expression level in a tissue sample as the user passesthe mouse over a corresponding mark.

[0057] It will be appreciated that the display of FIG. 6 is not limitedto expression information. The two dimensions of FIG. 6 may correspondto indicators of the presence of various polymers other than nucleicacids in two different samples. For example, each mark may correspond toa different polymer, polypeptide, or other compound. The distance of themark from each axis would correspond to a measure of presence of theparticular polymer in the sample corresponding to the axis. One possiblemeasure is produced by fluorescently tagging polymer samples such asprotein samples and exposing a probe array such as a peptide probe arrayto the protein samples. The fluorescent intensity of the probes willthen correspond to the bonding affinity of the sample to the probes. Theintensity measurement or a measurement derived from the intensitymeasurement may then be used to position the marks of FIG. 6.

[0058]FIG. 7A shows a screen display giving information about aparticular gene selected from the display of FIG. 6. A cluster number702, a GenBank accession number 704, and a verbal description 706 forthe selected gene are displayed. The user can also select a number ofmarks 1006 by circling them with cursor 1014. Then a list of informationas shown in FIG. 7A is displayed for all the genes corresponding to theselected marks.

[0059] By selecting GenBank accession number 704 with another cursor(not shown), the user can direct retrieval of the GenBank informationfor the selected gene. If the GenBank information is not availablelocally, the retrieval process can include formulating a query andtransmitting the query to a GenBank web site. Once the GenBankinformation is retrieved, it can also be displayed. FIG. 7B depicts theGenBank information for the gene identified in FIG. 7A.

[0060] In the foregoing specification, the invention has been describedwith reference to specific exemplary embodiments thereof. It will,however, be evident that various modifications and changes may be madethereunto without departing from the broader spirit and scope of theinvention as set forth in the appended claims and their full scope ofequivalents.

1 2 2691 base pairs nucleic acid unknown not relevant DNA (genomic) Homosapiens 1 GGAGACAGAC AGACAGCTGG CAAGAGGCAG CCTGGGGGCC ACAGCTGCTTCAGCAGACCT 60 CATGGCTGAG TGAGCCTCCC CTGGGCCCAG CACCCCACCT CAGCATGGTCCAAGCCCATG 120 GGGGGCGCTC CAGAGCACAG CCGTTGACCT TGTCTTTGGG GGCAGCCATGACCCAGCCTC 180 CGCCTGAAAA AACGCCAGCC AAGAAGCATG TGCGACTGCA GGAGAGGCGGGGCTCCAATG 240 TGGCTCTGAT GCTGGACGTT CGGTCCCTGG GGGCCGTAGA ACCCATCTGCTCTGTGAACA 300 CACCCCGGGA GGTCACCCTA CACTTTCTGC GCACTGCTGG ACACCCCCTTACCCGCTGGG 360 CCCTTCAGCG CCAGCCACCC AGCCCCAAGC AACTGGAAGA AGAATTCTTGAAGATCCCTT 420 CAAACTTTGT CAGCCCCGAA GACCTGGACA TCCCTGGCCA CGCCTCCAAGGACCGATACA 480 AGACCATCTT GCCAAATCCC CAGAGCCGTG TCTGTCTAGG CCGGGCACAGAGCCAGGAGG 540 ACGGAGATTA CATCAATGCC AACTACATCC GAGGCTATGA CGGGAAGGAGAAGGTCTACA 600 TTGCCACCCA GGGCCCCATG CCCAACACTG TGTCGGACTT CTGGGAGATGGTGTGGCAAG 660 AGGAAGTGTC CCTCATTGTC ATGCTCACTC AGCTCCGAGA GGGCAAGGAGAAATGTGTCC 720 ACTACTGGCC CACAGAAGAG GAAACCTATG GACCCTTCCA GATCCGCATCCAGGACATGA 780 AAGAGTGCCC AGAATACACT GTGCGGCAGC TCACCATCCA GTACCAGGAAGAGCGCCGGT 840 CAGTAAAGCA CATCCTCTTT TCGGCCTGGC CAGACCATCA GACACCAGAATCAGCTGGGC 900 CCCTGCTGCG CCTAGTGGCA GAGGTGGAGG AGAGCCCGGA GACAGCCGCCCACCCCGGGC 960 CTATCGTAGT CCACTGCAGT GCAGGGATTG GCCGGACGGG CTGCTTCATCGCCACGCGAA 1020 TTGGCTGTCA ACAGCTGAAA GCCCGAGGAG AAGTGGACAT TCTGGGTATTGTGTGCCAAC 1080 TGCGGCTAGA CAGAGGGGGG ATGATCCAGA CGGACGAGCA GTACCAGTTCCTGCACCACA 1140 CTTTGGCCCT GTATGCAGGC CAGCTGCCTG AGGAACCCAG CCCCTGACCCCTGCCACCCT 1200 CCGGTGGCCC AGGTGCCTAC CTCCCTCAAG CCTGGGAAGT CACAGGAAGCAGCAGCAGTA 1260 AGGACAAGGG GCCGGATTCC AGGTCTTCAA CACTGGCCAC TCCTCTGCTTCCTCTGTTGG 1320 CCCCAGATGG ACAGTAAGGG GAACCTCCAA TGTCTCTCTG AACTTAAAGACAGGAGCTGG 1380 CATTTATGAC AGACAAAGAA AGAAGCCCAG GTGTCCTGGT GTTCTCTGAGACACTCTTTG 1440 TGAGCTTCAG TTTCCTGTTC TATAACATGA ACATAAGTGC TTAGCTGCCATGAGGGAAAA 1500 GTAATGAGAG AAGTTTCTAG AAGCCACTCC AGCCACTCCT TCCTGGGGCTGACAAAAGGGG 1560 TGATTCCAAG ATCATCCTTC ACCCGAGGTC CTGCCCAAGC ACAGGCCAGATGCAAGAATG 1620 GGGAAAAGTC TGGTCCTGAT CTCCAAGTCT CAACATCCTA TCAGTGACTCTGCTCCCTGA 1680 CCACACATCG GAAGGGCTGG ATGACCCCAA TCAAAAGAAA GAACAAGGACTCTGGTTACC 1740 CTTGCCCTCC ACCCATGTGT CATAAGAGTA GGCTACAGAG GTGACCAGGCCTGGCAGTTG 1800 AAATCTCTGG AAGAGGGAAC ATGTGGGGAC TACTCAGAGG CAAAGAGGAGCTGCTCCTGC 1860 CTCCATGGTT GCTGGCCACT CCCACCAACT ACTCTTAGGG AGGCTAAGCAGTCTCTGTTT 1920 TGCTTCCATG GCTCAAATAA TACCCTGGGT ATGCAGGACC CACTATACCTTGCATTTGCT 1980 GGTACACCTA GAGAGCTTGG CTGTTTCCAA AAACAATCAG GGTCATAACCATCCATGCAG 2040 ACATGGAGGC TCGGCTGAAC CAGGACTCCT CACTGTCTAC CTGAGAGAATGAGCACCCCT 2100 CATCCATCTC AGCATCAACA CAATTTCCAG GGGACCTCAG GTCTACCTCAGGACTGAACG 2160 CCACACCTCA GGATTCCTCC TCCTTGAATC TGAGACTGGC TGCCCATTCTGAGATGGGGA 2220 TGAAGGTAAG ATGCCGCATC ACCAGGCACG CCGCCCCTGA CAGCTGCCTTGATACCAGCT 2280 CTCTGTGGAA ACCCCCGAGG AGTTGGATCT GGAGAACAGC TGGGCCTCCTCACTCAGGAC 2340 TTCTCTCCTG AAGAACACGC AGTGCTAAAA CTGAGGATGA TTTCCCTAATGCTTCTGCTT 2400 GGCCTTATGG AGGAGCTGCT CCTTCCTTAC AGCCTTGGGG ATGGACTTGCCCACACCTCC 2460 ACCTCCCCTG AGCCCTGTGA GAGGCACGAC TGTCTATGCC AATGAGGCTCGGTGGGGGGC 2520 TCTCAAGTGC CTGATCCTGC CCTGGGCTCA GAGCCAGCCC AGAGGGAAGCAACTGCACAG 2580 CCCCACAGGC CCTCCCTGGC ACTGTCCCCC CAACCCCATC TCAGAGCTCAGAGGGTACAA 2640 GCTCCAGAAC AGTAACCAAG TGGGAAAATA AAGACTTCTT GGATGACTGA C2691 360 amino acids amino acid not relevant not relevant protein Homosapiens 2 Met Val Gln Ala His Gly Gly Arg Ser Arg Ala Gln Pro Leu ThrLeu 1 5 10 15 Ser Leu Gly Ala Ala Met Thr Gln Pro Pro Pro Glu Lys ThrPro Ala 20 25 30 Lys Lys His Val Arg Leu Gln Glu Arg Arg Gly Ser Asn ValAla Leu 35 40 45 Met Leu Asp Val Arg Ser Leu Gly Ala Val Glu Pro Ile CysSer Val 50 55 60 Asn Thr Pro Arg Glu Val Thr Leu His Phe Leu Arg Thr AlaGly His 65 70 75 80 Pro Leu Thr Arg Trp Ala Leu Gln Arg Gln Pro Pro SerPro Lys Gln 85 90 95 Leu Glu Glu Glu Phe Leu Lys Ile Pro Ser Asn Phe ValSer Pro Glu 100 105 110 Asp Leu Asp Ile Pro Gly His Ala Ser Lys Asp ArgTyr Lys Thr Ile 115 120 125 Leu Pro Asn Pro Gln Ser Arg Val Cys Leu GlyArg Ala Gln Ser Gln 130 135 140 Glu Asp Gly Asp Tyr Ile Asn Ala Asn TyrIle Arg Gly Tyr Asp Gly 145 150 155 160 Lys Glu Lys Val Tyr Ile Ala ThrGln Gly Pro Met Pro Asn Thr Val 165 170 175 Ser Asp Phe Trp Glu Met ValTrp Gln Glu Glu Val Ser Leu Ile Val 180 185 190 Met Leu Thr Gln Leu ArgGlu Gly Lys Glu Lys Cys Val His Tyr Trp 195 200 205 Pro Thr Glu Glu GluThr Tyr Gly Pro Phe Gln Ile Arg Ile Gln Asp 210 215 220 Met Lys Glu CysPro Glu Tyr Thr Val Arg Gln Leu Thr Ile Gln Tyr 225 230 235 240 Gln GluGlu Arg Arg Ser Val Lys His Ile Leu Phe Ser Ala Trp Pro 245 250 255 AspHis Gln Thr Pro Glu Ser Ala Gly Pro Leu Leu Arg Leu Val Ala 260 265 270Glu Val Glu Glu Ser Pro Glu Thr Ala Ala His Pro Gly Pro Ile Val 275 280285 Val His Cys Ser Ala Gly Ile Gly Arg Thr Gly Cys Phe Ile Ala Thr 290295 300 Arg Ile Gly Cys Gln Gln Leu Lys Ala Arg Gly Glu Val Asp Ile Leu305 310 315 320 Gly Ile Val Cys Gln Leu Arg Leu Asp Arg Gly Gly Met IleGln Thr 325 330 335 Asp Glu Gln Tyr Gln Phe Leu His His Thr Leu Ala LeuTyr Ala Gly 340 345 350 Gln Leu Pro Glu Glu Pro Ser Pro 355 360

What is claimed is:
 1. A computer-implemented method of presentingexpression level information as collected from first and second samples,said method comprising the steps of: displaying a first axiscorresponding to expression level in said first sample; displaying asecond axis substantially perpendicular to said first axis, said secondaxis corresponding to expression level in said second sample; and for aselected expressed sequence, displaying a mark at a position, whereinsaid position is selected relative to said first axis in accordance withan expression level of said selected expressed sequence in said firstsample and relative to said second axis in accordance with an expressionlevel of said selected expressed sequence in said second sample.
 2. Themethod of claim 1 wherein said selected expressed sequence comprises agene.
 3. The method of claim 1 wherein said selected expressed sequencecomprises a portion of a gene.
 4. The method of claim 1 furthercomprising the step of repeating said displaying a mark step for aplurality of selected expressed sequences.
 5. The method of claim 1further comprising the steps of: monitoring said expression level ofsaid expressed sequence in said first sample and said second sample. 6.The method of claim 3 wherein said monitoring step for one of saidsamples comprises substeps of: inputting a plurality of hybridizationintensities of pairs of perfect match and mismatch probes, said perfectmatch probes being perfectly complementary to a target nucleic acidsequence indicative of expression of said selected gene and saidmismatch probes having at least one base mismatch with said targetsequence, and said hybridization intensities indicating hybridizationaffinity between said perfect match and mismatch probes and a samplenucleic acid sequence from said one of said samples; comparing thehybridization intensities of each pair of perfect match probe andmismatch probe; and generating said expression level for said expressedsequence and said one of said samples responsive to results of saidcomparing step.
 7. The method of claim 6 further comprising the step of:comparing a difference between hybridization intensities of perfectmatch and mismatch probes at a base position to a difference threshold.8. The method of claim 7 further comprising the step of: comparing aquotient of hybridization intensities of perfect match and mismatchprobes at a base position to a ratio threshold.
 9. The method of claim 6further comprising the steps of: a) counting a probe pair as a positiveprobe pair to increment a positive probe pair count if a perfect matchprobe intensity minus a mismatch probe intensity exceeds a differencethreshold and said perfect match probe intensity divided by saidmismatch probe intensity exceeds a ratio threshold; b) counting saidprobe pair as a negative probe pair to increment a negative probe paircount if said mismatch probe intensity minus said perfect match probeintensity exceeds said difference threshold and said mismatch probeintensity divided by said perfect match probe intensity exceeds saidratio threshold; and c) computing a logarithmic ratio of said perfectmatch probe intensity to said mismatch probe intensity.
 10. The methodof claim 9 further comprising the steps of: repeating said a), b), andc) steps for each of said probe pairs, accumulating a sum of differencesof said perfect match and mismatch probe intensities for probe pairsthat cause; and determining an expression level of said selectedexpressed sequence to be an average of said differences.
 11. The methodof claim 1 further comprising the steps of: receiving user inputselecting said mark; and in response to said user input, displayinginformation about said selected expressed sequence.
 12. The method ofclaim 11 further comprising the steps of: in response to said userinput, displaying information about said selected expressed sequence.13. The method of claim 12 wherein said information about said selectedexpressed sequence comprises a GenBank accession number.
 14. The methodof claim 12 wherein said information about said selected expressedsequence comprises a GenBank database record for said selected expressedsequence.
 15. The method of claim 1 wherein said first sample and saidsecond sample are collected from tissue samples differing in aparticular characteristic.
 16. The method of claim 15 wherein saidparticular characteristic comprises presence of disease.
 17. The methodof claim 15 wherein said particular characteristic comprises a treatmentstrategy for a disease.
 18. The method of claim 1 wherein saidparticular characteristic is a stage of a disease.
 19. The method ofclaim 1 further comprising the step of: displaying a third axissubstantially perpendicular to said first axis and to said second axisin a three-dimensional display environment wherein said position of saidmark is further selected relative to said third axis in accordance withan expression level of said selected expressed sequence in a thirdsample.
 20. A computer-implemented method of presenting sample analysisinformation comprising the steps of: displaying a first axiscorresponding to a concentration of a compound in a first sample asdetermined by monitoring binding of said compound to a selected polymerhaving binding affinity to said compound; displaying a second axissubstantially perpendicular to said first axis, said second axiscorresponding to a concentration of said compound in said second sampleas determined by monitoring binding of said compound to said selectedpolymer; and displaying a mark at a position, wherein said position isselected relative to said first axis in accordance with saidconcentration in said first sample and relative to said second axis inaccordance with said concentration in said second sample.
 21. The methodof claim 20 wherein said selected polymer comprises a nucleic acidsequence.
 22. The method of claim 20 wherein said selected polymercomprises a protein.
 23. The method of claim 21 further comprising thestep of: obtaining said concentration of said compound in said firstsample by exposing said first sample to a plurality of nucleic acidprobes.
 24. The method of claim 22 further comprising the step of:obtaining said concentration of said compound in said first sample byexposing said first sample to a plurality of peptide probes.
 25. Acomputer program product for presenting expression level information ascollected from first and second samples, said product comprising: codefor displaying a first axis corresponding to expression level in saidfirst sample; code for displaying a second axis substantiallyperpendicular to said first axis, said second axis corresponding toexpression level in said second sample; code for, for a selectedexpressed sequence, displaying a mark at a position, wherein saidposition is selected relative to said first axis in accordance with anexpression level of said selected expressed sequence in said firstsample and relative to said second axis in accordance with an expressionlevel of said selected expressed sequence in said second sample; and acomputer-readable storage medium for storing the codes.
 26. The productof claim 25 wherein said selected expressed sequence comprises a gene.27. The product of claim 25 wherein said selected expressed sequencecomprises a portion of a gene.
 28. The product of claim 25 furthercomprising code for repeatedly applying said displaying a mark code fora plurality of selected expressed sequences.
 29. The product of claim 25further comprising: code for monitoring said expression level of saidexpressed sequence in said first sample and said second sample.
 30. Theproduct of claim 27 wherein said monitoring step for one of said samplescomprises: code for inputting a plurality of hybridization intensitiesof pairs of perfect match and mismatch probes, said perfect match probesbeing perfectly complementary to a target nucleic acid sequenceindicative of expression of said selected gene and said mismatch probeshaving at least one base mismatch with said target sequence, and saidhybridization intensities indicating hybridization affinity between saidperfect match and mismatch probes and a sample nucleic acid sequencefrom said one of said samples; comparing the hybridization intensitiesof each pair of perfect match probe and mismatch probe; and generatingsaid expression level for said expressed sequence and said one of saidsamples responsive to results of said comparing step.
 31. The product ofclaim 30 further comprising: code for comparing a difference betweenhybridization intensities of perfect match and mismatch probes at a baseposition to a difference threshold.
 32. The product of claim 31 furthercomprising: code for comparing a quotient of hybridization intensitiesof perfect match and mismatch probes at a base position to a ratiothreshold.
 33. The product of claim 30 further comprising: a) code forcounting a probe pair as a positive probe pair to increment a positiveprobe pair count if a perfect match probe intensity minus a mismatchprobe intensity exceeds a difference threshold and said perfect matchprobe intensity divided by said mismatch probe intensity exceeds a ratiothreshold; b) code for counting said probe pair as a negative probe pairto increment a negative probe pair count if said mismatch probeintensity minus said perfect match probe intensity exceeds saiddifference threshold and said mismatch probe intensity divided by saidperfect match probe intensity exceeds said ratio threshold; and c) codefor computing a logarithmic ratio of said perfect match probe intensityto said mismatch probe intensity.
 34. The product of claim 33 furthercomprising: code for repeatedly applying said a), b), and c) codes foreach of said probe pairs, accumulating a sum of differences of saidperfect match and mismatch probe intensities for probe pairs that cause;and code for determining an expression level of said selected expressedsequence to be an average of said differences.
 35. The product of claim25 further comprising: code for receiving user input selecting saidmark; and code for, in response to said user input, displayinginformation about said selected expressed sequence.
 36. The product ofclaim 35 further comprising: code for, in response to said user input,displaying information about said selected expressed sequence.
 37. Theproduct of claim 36 wherein said information about said selectedexpressed sequence comprises a GenBank accession number.
 38. The productof claim 36 wherein said information about said selected expressedsequence comprises a GenBank database record for said selected expressedsequence.
 39. The product of claim 25 wherein said first sample and saidsecond sample are collected from tissue samples differing in aparticular characteristic.
 40. The product of claim 39 wherein saidparticular characteristic comprises presence of disease.
 41. The productof claim 39 wherein said particular characteristic comprises a treatmentstrategy for a disease.
 42. The product of claim 25 wherein saidparticular characteristic is a stage of a disease.
 43. The product ofclaim 25 further comprising the step of: displaying a third axissubstantially perpendicular to said first axis and to said second axisin a three-dimensional display environment wherein said position of saidmark is further selected relative to said third axis in accordance withan expression level of said selected expressed sequence in a thirdsample.
 44. A computer program product for presenting sample analysisinformation comprising: code for displaying a first axis correspondingto a concentration of a compound in a first sample as determined bymonitoring binding of said compound to a selected polymer having bondingaffinity to said compound; code for displaying a second axissubstantially perpendicular to said first axis, said second axiscorresponding to concentration of said compound in a second sample asdetermined by monitoring binding of said compound to said selectedpolymer; code for displaying a mark at a position, wherein said positionis selected relative to said first axis in accordance with saidconcentration in said first sample and relative to said second axis inaccordance with said concentration in said second sample; and acomputer-readable storage medium that stores the codes.
 45. The productof claim 44 wherein said selected polymer comprises a nucleic acidsequence.
 46. The product of claim 44 wherein said selected polymercomprises a protein.
 47. A computer system comprising a display, aprocessor, and a memory that stores instructions for configuring saidprocessor to: display a first axis corresponding to expression level insaid first sample; display a second axis substantially perpendicular tosaid first axis, said second axis corresponding to expression level insaid second sample; and for a selected expressed sequence, display amark at a position, wherein said position is selected relative to saidfirst axis in accordance with an expression level of said selectedexpressed sequence in said first sample and relative to said second axisin accordance with an expression level of said selected expressedsequence in said second sample.
 48. A computer system comprising adisplay, a processor, and a memory that stores instructions forconfiguring said processor to: display a first axis corresponding to aconcentration of a compound in a first sample as determined bymonitoring binding of said compound to a selected polymer having bindingaffinity to said compound; display a second axis substantiallyperpendicular to said first axis, said second axis corresponding to aconcentration of said compound in said second sample as determined bymonitoring binding of said compound to said selected polymer; anddisplay a mark at a position, wherein said position is selected relativeto said first axis in accordance with said concentration in said firstsample and relative to said second axis in accordance with saidconcentration in said second sample.